Finding Structural Correspondences from Bilingual Parsed Corpus for Corpus-based Translation
نویسندگان
چکیده
In this paper, we describe a system and methods for finding structural correspondences from the paired dependency structures of a source sentence and its translation in a target language. The system we have developed finds word correspondences first, then finds phrasal correspon(tences based on word correspondences. We have also developed a GUI system with which a user can check and correct tile correspondences retrieved by the system. These structural correspondences will be used as raw translation I)atterns in a corpus-based translation system.
منابع مشابه
Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation
This paper describes a system for finding phrasal translation correspondences from parallel parsed corpus that are collections paired English and Japanese sentences. First, the system finds phrasal correspondences by Japanese-English translation dictionary consultation. Then, the system finds correspondences in remaining phrases by using sentences dependency structures and the balance of all co...
متن کاملAutomatic Construction of Translation Knowledge for Corpus-based Machine Translation
Many machine translation (MT) systems that utilize the knowledge automatically acquired from bilingual corpora have been proposed in conjunction with efforts to accumulate corpora. We call this approach corpus-based machine translation in this thesis. This thesis focuses on automatic construction of the translation knowledge needed for corpus-based MT and discusses the following three tasks. 1....
متن کاملSemi-automatic Compilation of Bilingual Lexicon Entries from Cross-Lingually Relevant News Articles on WWW News Sites
For the purpose of overcoming resource scarcity bottleneck in corpus-based translation knowledge acquisition research, this paper takes an approach of semi-automatically acquiring domain specific translation knowledge from the collection of bilingual news articles on WWW news sites. This paper presents results of applying standard co-occurrence frequency based techniques of estimating bilingual...
متن کاملBilingual lexicon extraction for a distant language pair using a small parallel corpus
The aim of this thesis proposal is to perform bilingual lexicon extraction for cases in which small parallel corpora are available and it is not easy to obtain monolingual corpus for at least one of the languages. Moreover, the languages are typologically distant and there is no bilingual seed lexicon available. We focus on the language pair Spanish-Nahuatl, we propose to work with morpheme bas...
متن کاملSub-Sentential Alignment Method by Analogy
This paper describes a method for searching word correspondences between pairs of translation sentences. In the Example-Based Machine Translation, translation patterns can be extracted easily if word correspondences between pair of translation sentences are defined. The popular methods for aligning bilingual corpus at a sub-sentential level are unable to produce reliable result when the size of...
متن کامل